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LONGEST PREFIX MATCH FOR IP ROUTERS 

TECHNICAL FIELD 

This invention relates generally to longest prefix match 
for Internet Protocol (IP) routers. 

BACKGROUND 

5 A router uses a destination address of every incoming 

packet of data to decide the proper next-hop information of 
the packet. High-speed routers are required to make these 
decisions at the speed of several million packets per second. 
Each search finds the longest prefix match of the destination 

10 address among all stored prefixes in the router. 

BRIEF DESCRIPTION OF DRAWINGS 

The foregoing features and other aspects of the invention 
will be described further in detail by the accompanying 
drawings, in which: 
15 FIG. 1 is a block diagram of a packet switched network. 

FIG. 2 is a block diagram of the router of FIG. 1. 

FIG. 3 is a block diagram of a tree data structure used 
in the router of FIG. 2. 

FIG. 4 is a block diagram of a trie data structure. 
20 FIG. 5 is a flow diagram of a route add process. 

FIG. 6 is a flow diagram of a longest prefix match look- 
up process. 
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Like reference symbols in the various drawings indicate 
like elements . 

DETAILED DESCRIPTION 

Referring to FIG. 1, in a packet switched network 10, a 

5 source 12 is connected to one or more routers 14 for 

transmitting packets to one or more destinations 16. Each 
router 14 includes a number of ports 18 that are connected to 
various sources and destinations. Accordingly, a packet from 
source 12 may pass through more than one router 14 prior to 

10 arriving at its destination 14. 

Referring to FIG. 2, each router 14 includes an input 
switch 50, an output switch 52 , a memory 54, a controller 56, 
a number of input ports 58 and a number of output ports 60. 
Associated with the controller 56 is a memory element 62 for 

15 storing controller data. Each switch 50 and 52 is connected 
to each input and output port 58 and 60, respectively, in the 
router 14. In an embodiment, router 14 includes eight input 
and output ports 58 and 60, respectively. In this embodiment, 
the number of input ports and output ports is equal, however, 

20 other embodiments may necessitate greater numbers of input 
ports or output ports. 

Associated with the controller 56 is a route look-up 
engine (RLE) 64. In an embodiment, a number of route look-up 
engines 64 are included in the controller 56, each receiving 

25 look-up requests in round-robin fashion, so as to speed the 

routing process. In another embodiment, memory element 62 is 
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a four bank static random access memory (SRAM) that includes 
route look-up engines 64 to service at full band width. 

In operation, packets are received at an input port 58, 
transferred to the input switch 50, and stored temporarily in 

5 the memory 54. When the input switch 50 receives the packet, 
a destination address is extracted from the first data block 
in the packet and transferred to the controller 56. The input 
switch 50 includes a transfer engine (not shown) for 
transferring packets received from the input port 58 to memory 

10 54 . 

Route look-up engine 64 performs a search for the longest 
prefix match of the destination address contained in a 
forwarding table (also referred to as a routing table) 
residing in the controller 56. IP version 4 (IPv4) forwarding 

15 tables include a set of routes that is updated by routing 

protocols such as RIP (Routing Information Protocol) . Each 
route determines the outgoing interface, i.e., output port 60, 
for a set of IP destination addresses, which is represented by 
an IP address and a subnet mask. Both IPv4 addresses and the 

20 subnet masks are 32 bit numbers. In particular, if the K TH bit 
of the subnet mask is 1, it indicates that the K ra bit of the 
corresponding IP address is significant; otherwise, it 
indicates that the K ra bit is insignificant. For example, if 
the IP addresses 12345678 and FFFFF000 (both are in 

25 hexadecimal format) define a route, the set of addresses 

between 12345000 and 12345FFF belongs to this route. Each 
subnet mask includes contiguous ones from the most significant 
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bit. Therefore, a route is defined by the prefix of the 
corresponding address. The route look-up engine 64 performs a 
look-up in its IP forwarding table to determine the longest 
prefix match. Once a match has been determined, the route 
5 look-up engine 64 returns a result that includes the output 
port 60 associated with the destination address. 

The search result and other information (e.g., source ID, 
flow ID, packet length, quality of service and statistical 
information) for routing the packet through the router 14 
10 combine to form a notification. The notification is 

transferred from the controller 56 to the output switch 52, 
d| Upon receiving the notification the output switch 52 initiates 
n a transfer of the packet from memory 54 to the respective 

hi output port 60 associated with the search result. 

j«15 Certain packets allow routers to add new entries in 

U forwarding tables, while other packets inform the routers that 

i: an entry in a routing table should be modified or deleted, and 

y the remaining packets follow the routes indicated by the 

3 previously added forwarding table entries. An IP longest 

20 prefix match process adds a new route entry into the 

forwarding table, given an IP address and a prefix that 
includes 32 bits, where the bits are a series of leading l's 
followed by all 0 f s. The route look-up (i.e., given a 32 bit 
IP address only (no prefix mask) ) returns the route entry that 
25 has the longest prefix match. It is important for routers to 
perform the route look-up in the forwarding table in the 
shortest possible time, or with the fewest accesses to memory, 
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in order to achieve high packet forwarding throughput. It may 
also be desirable to update the forwarding table without 
stopping packet forwarding altogether. 

In an embodiment, route look-up engine (RLE) 64 performs 
a longest prefix match process on two trees of tables 
generated and written to by a core processor within the router 
14 . 

Referring to FIG. 3, a data structure 70 includes a large 
table 72 at the root, branching to small tables 74, called 
trie tables . A traversal proceeds in parallel on the two 
trees, using pipelined reads. Each trie table 74 is addressed 
by a span of IP destination address bits to locate an indexed 
table entry. Each indexed trie table entry can optionally 
contain a route entry pointer and a pointer to the next table. 
Each trie table 74 contains prefix match fields for each 
indexed entry, a population count of pointers, and hidden 
prefix entries that hold shorter prefix route entry pointers. 
A leaf entry is copied to other leaf entries in the same trie 
table 74 to enable matches on multiple IP destination 
addresses, yielding the route entry that has the longest 
prefix match. Hidden entries are copied to shared memory 
entries when a longer prefix entry is deleted. 

Each of the trie tables 74 is used to facilitate a fast 
route look-up. Information structures, described below, are 
used to store pointers and masks used only by the route table 
look-up engine 64 during route add and delete. In an 
embodiment, the trie tables 74 are kept in SRAM and are used 
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for look-ups as well as for route add/deletes. As described 
above, each trie table 74 includes a set of trie entries. A 
trie entry is a 32 -bit longword, divided into two fields, 
i.e., a route pointer and a trie pointer. If the trie entry- 
is zero, there is no route. If the trie pointer is non-zero, 
there is a longer prefix entry to be found. If the route 
pointer is non-zero, there is a route entry at the current 
prefix length of this trie entry 1 s table. 

Referring to FIG. 4, a data structure 90 includes a hi64k 
table 92 and a hi256 table 94. The hi64k table 92 is a single 
64k entry table that is indexed by bits 31:16 of an IP 
address. The prefix length associated with this table is 
sixteen. The hi256k table 94 is a single 256 entry table that 
is indexed by bits 31:24 of the IP address. The prefix length 
associated with the hi256k table 94 is eight. 

There are multiple trie blocks, e.g., trie block 96 and 
98, that are dynamically allocated as needed. Each trie block 
96 and 98 contain sixteen entries. Extending from the hi64k 
table 92 or hi256 table 94, trie blocks form a tree with each 
node representing 4 bits of addresses, and covering an 
extension of 1-4 bits of prefix from the previous trie node. 
In an embodiment, trie nodes are kept in SRAM. 

Each trie block 96 and 98 has a corresponding trie block 
information structure. Specifically, trie block 96 has trie 
block information structure 100 and trie block 98 has trie 
block information structure 102. Each trie block information 
structure 100 associated with the hi64k table 92, for example, 
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includes a prefix 1 [2] member 104, a prefix 2 [4] member 106, 
a prefix 3 [8] member 108 and a mask [16] member 110. 

The prefix 1 [2] member 104 is used to hold route entry 
pointers for each entry with prefix mask field length of 1. 
5 The prefix 2 [4] member 106 is used to hold route entry 

pointers for each entry with prefix match field length of 2. 
The prefix 3 [8] member 108 hold route entry pointers for each 
entry with prefix mask field length of 3. The mask [16] 
member 110 contains a 4 -bit prefix mask field to indicate the 
10 prefix length of the associated trie entry. 

The trie information structure 102 associated with the 
yS hi256 table 94 includes a prefix 1 [2] member 112, a prefix 2 

O [41 member 114, a prefix 3 [8] member 116, a prefix 4 [16] 

yj member 118, a prefix 5 [32] member 120, a prefix 6 [64] member 

1*15 122, a prefix 7 [128] member 124 and a mask [256] member 126. 
^ Each of the members is used as follows: 

%: prefix 1 [2] 112: hold route entry pointers for entries 

s 2 with prefix mask field length of 1; 

y20 prefix 2 [4] 114: hold route entry pointers for entries 

O with prefix mask field length of 2; 

prefix 3 [8] 116: hold route entry pointers for entries 
with prefix mask field length of 3; 

25 

prefix 4 [16] 118: hold route entry pointers for entries 
with prefix mask field length of 4; 

prefix 5 [32] 120: hold route entry pointers for entries 
30 with prefix mask field length of 5; 

prefix 6 [64] 122: hold route entry pointers for entries 
with prefix mask field length of 6; 

35 prefix 7 [128] 124; hold route entry pointers for 

entries with prefix mask field length of 7; and 
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prefix 8 [256] 126: contains an 8-bit prefix mask field 
to indicate the prefix length of the associated entry. 

Referring to FIG. 5, a route add process 10 0 residing in 

5 the controller 56 {of FIG. 2) receives 102 an incoming packet 

and extracts 104 the destination address. The route add 

process 100 determines 106 the number of trailing 4 bit 

nibbles that are all 0s. This is a quick loop of maximum 

eight iterations through the network prefix mask. A start 

10 table can be either a 64khi__table or a 256hi_table. 

The process 10 0 traverses 108 the trie beginning at the 

start table r either the 64khi table or 256hi table, going from 

p start trie nibble lowest significant bit (LSB) to end trie 

nibble LSB, If a trie pointer exists 110 for a next node, the 

S15 process 10 0 saves 112 the pointer in a temporary array indexed 

rtf in the order of the trie traversal. If no trie pointer 

4b exists, the process 100 allocates 114 a new pointer and saves 

y 112 the new pointer in the temporary array. When the process 

Hi 100 reaches the leaf nibble, the process 100 allocates 118 a 

Q20 route entry and, depending on the portion of the network mast 

for that nibble, writes 12 0 multiple route entry pointers to 

the leaf trie block. To decide whether to write an entry, 

process 10 0 uses a side mask set to hold the mask nibble for 

an existing entry. The process 10 0 writes only if the new 

25 entry mask is longer than the existing entry mask. If the new 

entry nibble mask is Oxf, the process 100 will write one 

entry. If the new entry nibble mask is Oxe, the process 100 

will attempt to write two entries, as described below. When 

the leaf has been written, the trie is filled in going from 
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leaf toward root. In this manner concurrent route lookups 
will not be directed to non-existent entries. Finally, the 
process 100 writes 122 the start table entry, thus completing 
124 the update. 

5 If the start table is hi256_table, the process 100 

performs a multiple write for up to 128 entries based on 
network prefix mask bits 31:24. If the submask is Oxff, the 
process 10 0 writes one entry; if submask is Oxfe, the process 
10 0 attempts to write two entries, and so forth. 
10 In an embodiment, a pre-allocated variation of the 

process 10 0 is employed, where a set of route adds are sorted 
by length, and the 64khi_table is updated with multiple writes 
H starting with shortest prefix, going to longest. In this way 

jtl the process 10 0 does not need to compare the previous mask. 

^:15 This is generally not preferred in systems where dynamic route 
!L updates occur, because a single 64khi_table update may require 

B| up to 32k read/writes, and seriously impact concurrent 

y9 forwarding performance . 

□ When writing a leaf trie block, multiple entries may have 

20 to be selectively written, depending on the mask and existing 
entries. This sets up the leaf trie block for the longest 
prefix match within the last 4 bits of the IP address that 
have any corresponding network prefix mask bits set. A lookup 
needs only to read the leaf trie block with a 4 bit address 
25 index. During lookup it the correct longest match entry will 

be found. Writing multiple entries utilizes a side table that 
holds the 4 bit masks for each of the 16 valid entries. When 
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and if the 4 bit mask is Oxf, the trie block and tri block 
mask entries for just one location are written. If the 4 bit 
mask is Oxe, entries for two locations are written, if the new 
mask is longer than the mask in the trie block mask. If the 4 
5 bit mask is Oxc, entries for four locations are written, if 
the new mask is longer than the mask in the trie block mask. 
If the 4 bit mask is 0x8 , entries for 8 locations are written, 
if the new mask is longer than the mask in the trie block 
mask. The single entry update is a read of previous trie 
10 pointer organized with new route pointer. 

To support a delete entry function (described below) , if 
J| a sub mask length for the hi256 table is less than eight, or 
p for a trie block, is less than four, the route pointer is 
iVj saved in the information structure prefix member field 

if 15 corresponding to the sub mask length. The prefix members can 
U later be retrieved and inserted in place of deleted entries 

tl with longer sub mask length. 

Referring to FIG. 6, a longest prefix match look-up 
3 process 150 includes performing 152 two parallel depth-wise 

20 tree searches, one starting at the hi64k table, and the other 
starting at the hi256 table. A longest prefix match look-up 
utilizes the tables set up by the route add process 100. 
First table lookup, tree nodes represent 4 bits of address. 
Each lookup includes 2 half words, i.e., a possible pointer to 
25 the route entry (rt_ptr_long and rt_ptr_short) , and a possible 
pointer to the next node in the tree (trie_ptr_long and 
trie_ptr_short ) . The process 150 determines 158 whether there 
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is a trie_ptr. If no triejptr exists, the resulting rt_j?trs 
are compared 156. The process 150 determines whether the 
rt__ptr__long is non-null. If non-null, the process 150 selects 
160 the prefix as a match. If rt_ptr_short is non-null, it is 
the match. If rt_ptr_short is non-null, process 150 reports 
162 no match. 

Each trie block has sixteen leaves that are indexed by 4 
bits of IP destination address. The full binary tree of 
possible matches is 16+8+4+2, or 30 possibilities. If all of 
these combinations are added (sixteen routes with submask 
length 4, eight routes with submask length 3, four routes with 
submask length 2 and two routes with submask length 1) , there 
will be fourteen hidden entries. Following this, if any one 
of the longer submask entries is deleted, the deleted entry 
will be replaced by the entry with the next longest submask. 
The route add process 100 stores the hidden entries in 
prefixl-3 members of trie_info. The same is done for the 
hi256 table, where hidden entries are stored in prefixl-7 
members of hi256_info. 

In addition to deleting the route table entry, deleting a 
route removes the corresponding trie entry, and traces back 
through the prefix members (e.g., prefix 3 then prefix 2, then 
prefix 1) to find the entry next longest prefix. It then 
inserts the route pointer found there into the trie block 
entry. 

A population count is kept for each trie block, to count 
the number of trie pointers and route entry pointers. The 
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count is incremented when either of these is added, and 
decrements when either of these is removed. When the 
population goes to 0 as a result of a deletion, the trie block 
is pushed back to a trie block freelist. When a route entry 

5 is deleted, its contents are zeroed, and it is pushed back to 
a route entry freelist. 

A number of embodiments of the invention have been 
described. Nevertheless, it will be understood that various 
modifications may be made without departing from the spirit 

10 and scope of the invention. Accordingly, other embodiments 
are within the scope of the following claims . 
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WHAT IS CLAIMED IS: 



1 1. A computer- implemented method of searching a 

2 database for a prefix representing a destination address 

3 comprising: 

4 loading two trees of tables, each tree of tables having a 

5 large table at a root branching to small tables; and 

6 traversing the two tables of trees in parallel to find a 

7 match of an entry to the prefix. 

1 2. The computer- implemented of claim 1 wherein an entry 

2 comprises: 

3 a router pointer representing the destination address; 

4 and 

5 a pointer to a next small table. 

1 3. The computer- implemented method of 1 wherein the 

2 small tables comprise: 

3 prefix match fields for indexed table entries; 

4 a population count of pointers; and 

5 hidden prefix entries that hold shorter prefix route 

6 entry pointers. 

1 4. The computer- implemented method claim 1 further 

2 comprising reporting a non-match if the prefix does not match 

3 an entry. 

1 5. The computer- implemented method of claim 1 wherein a 

2 first large table is a single 64k entry table that is indexed 

3 by bits 31:16 of an internet protocol (IP) address. 
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1 6. The computer- implemented method of claim 1 wherein a 

2 second large table is a single 256 entry table that is indexed 

3 by bits 31:24 of an internet protocol (IP) address, 

1 7. The computer- implemented method of claim 5 wherein 

2 the small tables are dynamically allocated and comprise: 

3 a tree with each node representing 4 bits of addresses 

4 covering an extension of 1-4 bits of a prefix entry from a 

5 previous tree. 

1 8. The computer- implemented method of claim 6 wherein 

2 the small tables are dynamically allocated and comprise: 

3 a tree with each node representing 4 bits of addresses 

4 covering an extension of 1-4 bits of a prefix entry from a 

5 previous tree. 

1 9. A computer storage device storing a data structure 

2 for managing prefix representing internet protocol (IP) 

3 destination addresses, the data structure comprising: 

4 two trees of tables, each tree of tables comprising: 

5 a trie block, the trie block including a route pointer 

6 and a trie pointer; 

7 a trie information structure, the trie information 

8 structure including masks and route entry pointers . 

1 10. A computer- implemented method of searching a 

2 collection of data comprising: 

3 searching a first table of trees and a second table of 

4 trees for a received search term, each of the trees of the 

5 first table and the second table containing a trie element and 
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6 a trie pointer, for a match of the search term with a trie 

7 element ; 

8 determining whether a trie pointer is non-null when the 

9 trie element matches the search term; 

10 comparing a trie element in the tree of the first table 

11 containing the null pointer with a trie element in the tree of 

12 the second table containing the null pointer; 

13 reporting a match if the search term matches the trie 

14 element in the first table of trees; and 

15 reporting a match of the search term matches the trie 

16 element in the second table of trees. 

S1 11. The computer- implemented method of claim 10 wherein 

y 2 the search term is a destination address. 

yj 1 12 . The computer-implemented method of claim 11 wherein 

jg 2 the destination address is a prefix. 

^ 1 13. A computer program product, disposed on a computer 

J2J 2 readable medium, for searching a database for a prefix 

2i 3 representing a destination address, the program comprising 

u 4 instructions for causing a computer to: 

5 load two trees of tables, each tree of tables having a 

6 large table at a root branching to small tables; and 

7 traverse the two tables of trees in parallel to find a 

8 match of an entry to the prefix. 

1 14. The computer program of claim 13 wherein an entry 

2 comprises: 

3 a router pointer representing the destination address; 

4 and 
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5 a pointer to a next small table. 

1 15. The computer program of claim 13 wherein the small 

2 tables comprise: 

3 prefix match fields for indexed table entries; 

4 a population count of pointers; and 

5 hidden prefix entries that hold shorter prefix route 

6 entry pointers. 

1 16. The computer program claim 13 further comprising 

2 instructions for causing the computer to report a non-match if 

3 the prefix does not match an entry. 

1 17 . The computer program of claim 13 wherein a first 

2 large table is a single 64k entry table that is indexed by 

3 bits 31:16 of an internet protocol (IP) address. 

1 18. The computer program of claim 13 wherein a second 

2 large table is a single 256 entry table that is indexed by 

3 bits 31:24 of an internet protocol (IP) address. 

1 19. The computer program of claim 17 wherein the small 

2 tables are dynamically allocated and comprise a tree with each 

3 node representing 4 bits of addresses covering an extension of 

4 1-4 bits of a prefix entry from a previous tree. 

1 20. The computer program of claim 18 wherein the small 

2 tables are dynamically allocated and comprise a tree with each 

3 node representing 4 bits of addresses covering an extension of 

4 1-4 bits of a prefix entry from a previous tree. 

1 21. The computer- implemented method of claim 3 further 

2 comprising: 

3 adding entries; and 

4 deleting entries. 
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1 22. The computer- implemented method of claim 21 wherein 

2 deleting entries comprises: 

3 removing corresponding trie entries; 

4 decrementing the population counter; 

5 determining an entry next longest prefix; and 

6 inserting the next longest prefix in the trie. 
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ABSTRACT 

A method of searching a database for a prefix 
representing a destination address including loading two trees 
of tables, each tree of tables having a large table at a root 
5 branching to small tables and traversing the two tables of 
trees in parallel to find a match of an entry to the prefix. 
An entry includes a router pointer representing the 
destination address and a pointer to a next small table. The 
small tables include prefix match fields for indexed table 
10 entries, a population count of pointers and hidden prefix 
, entries that hold shorter prefix route entry pointers. 
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